Support newer versions of MedCalc-Bench #3921

chakravarthik27 · 2025-11-12T09:09:18Z

This pull request updates the MedCalc-Bench scenario to use the latest dataset version and adds a basic test for the scenario. The main changes focus on keeping the dataset reference current and improving test coverage.

MedCalc-Bench scenario updates:

Updated the dataset reference in both the class docstring and the code to use MedCalc-Bench-v2.0 instead of v1.0 in medcalc_bench_scenario.py.
MedCalc-Bench-v1.0 changed to MedCalc-Bench-v2.0

Testing improvements:

Added a new test file test_medcalc_bench_scenario.py with a pytest-based test that verifies the scenario loads instances and that the first instance is from the "test" split.

Documentation formatting:

Changed the docstring in the MATHScenario class to use a raw string for improved formatting.

@yifanmai @MiguelAFH
Could you please review this PR?

…l_engine

chakravarthik27 · 2025-11-13T06:47:36Z

Hi @yifanmai, @MiguelAFH

Medcalc_bench v1.0 is returning a 404 error here: https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0.
I've switched to https://huggingface.co/datasets/ncbi/MedCalc-Bench-v2.0.

Could you please review this PR?

Thanks.

yifanmai · 2025-12-03T21:29:17Z

The link you sent https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0 does not return a 404. Could you clarify what you meant by this?

As for upgrading to MedCalc-Bench-v2.0, I am OK with this, but it should be a new separate run spec function in order to maintain reverse compatibility. Users running evals using the existing MedCalc-Bench should not see any changes.

chakravarthik27 · 2025-12-04T07:37:30Z

The link you sent https://huggingface.co/datasets/ncbi/MedCalc-Bench-v1.0 does not return a 404. Could you clarify what you meant by this?

As for upgrading to MedCalc-Bench-v2.0, I am OK with this, but it should be a new separate run spec function in order to maintain reverse compatibility. Users running evals using the existing MedCalc-Bench should not see any changes.

Hi @yifanmai

A few weeks ago, I encountered a 404 error and saw the dataset for version 2.0. Now, there were also additional updates to versions 1.1 and 1.2. As you suggested, I will work on creating the new run specifications for medcalc_bench_v1.1 and medcalc_bench_v1.2.

Thanks

Regards
@chakravarthik27

yifanmai · 2025-12-04T17:49:50Z

Great, thanks for the update.

nikhilk7153 · 2025-12-15T04:13:24Z

Hi, I was going to come to raise an issue about suggesting to use the new medcalc dataset, but it seems that someone else has gotten here before me. I have made a few more changes to the MedCalc-Bench dataset from v1.2 and you can find the newest dataset here: https://github.com/nikhilk7153/MedCalc-Bench-Verified.

All updates will be made on this new repo. MedCalc-Bench Verified is an updated version from v1.2. You can find the changes from the verified version here in the released version: https://github.com/nikhilk7153/MedCalc-Bench-Verified/releases/tag/MedCalc-Bench-Verified

…aset references

chakravarthik27 · 2025-12-21T05:38:09Z

Hi @yifanmai

can you please review this PR?

Thanks

yifanmai · 2026-01-05T18:40:27Z

src/helm/benchmark/scenarios/math_scenario.py


 class MATHScenario(Scenario):
-    """
+    r"""


Revert unrelated change.

src/helm/benchmark/run_specs/medhelm_run_specs.py

src/helm/benchmark/scenarios/medcalc_bench_scenario.py

nikhilk7153 · 2026-01-05T18:56:04Z

Will there be any future plans to update MedHELM @yifanmai ? Would it be possible to use MedCalc-Bench Verified instead of the v1.0 that is currently being used? We have fixed a number of annotation and ground truth label issues (approx. 1/3) of the dataset and so re-running would be beneficial to provide a more accurate version of the landscape.

yifanmai · 2026-01-05T18:59:06Z

I think this is more of a question for @MiguelAFH - the official evals and results are maintained by them, so it depend on whether there is funding and bandwidth available for this.

…unction with version parameter

chakravarthik27 · 2026-01-13T14:29:17Z

Hi @yifanmai,

I have made the requested updates. Could you please review this pull request?

Thanks

Regards
@chakravarthik27

…just run spec name

MiguelAFH · 2026-01-13T17:33:16Z

I think this is more of a question for @MiguelAFH - the official evals and results are maintained by them, so it depend on whether there is funding and bandwidth available for this.

Thanks for the heads up on the benchmark updates. We are currently working on a new release for when the paper comes out in the coming months - we will talk about this internally and let you know if there's enough bandwidth for it.

yifanmai

Looks good. Thank you!

chakravarthik27 added 10 commits August 18, 2025 10:29

Add OpenRouterClient implementation and tests

3e15251

updated: add types and HELM convention as per suggested.

ac04b19

updated: removed unnecessary code

0cfba1d

updated: to get model name from request.model instead of request.mode…

bc99bbd

…l_engine

Merge branch 'stanford-crfm:main' into main

cbd676e

Merge branch 'stanford-crfm:main' into main

02120c8

fix: module error for "shc_privacy_med" and "shc_proxy_med" run specs

311bc4e

Merge branch 'stanford-crfm:main' into main

2925562

Merge branch 'stanford-crfm:main' into main

d7d8ceb

fix: medcalc_bench dataset path from huggingface

5f30c51

chakravarthik27 added 2 commits December 15, 2025 10:53

feat: add MedCalc-Bench v1.0, v1.1, and v1.2 scenarios and update dat…

2283cc2

…aset references

fix: update MedCalc-Bench dataset link and improve v1.0 description

8848304

yifanmai requested changes Jan 5, 2026

View reviewed changes

refactor: consolidate MedCalc-Bench scenario versions into a single f…

5bb3014

…unction with version parameter

chakravarthik27 added 2 commits January 13, 2026 14:33

fix: update get_medcalc_bench_spec to handle version parameter and ad…

79c8554

…just run spec name

removed: unrelated to this pr

9ffc09b

yifanmai approved these changes Jan 13, 2026

View reviewed changes

yifanmai changed the title ~~chakravarthik27/fix medcalc bench dataset path~~ Support newer versions of MedCalc-Bench Jan 13, 2026

yifanmai merged commit cc517af into stanford-crfm:main Jan 13, 2026
6 of 12 checks passed

chakravarthik27 deleted the chakravarthik27/fix_medcalc_bench_dataset branch January 14, 2026 06:32

Support newer versions of MedCalc-Bench #3921

Support newer versions of MedCalc-Bench #3921

Uh oh!

Conversation

chakravarthik27 commented Nov 12, 2025

Uh oh!

chakravarthik27 commented Nov 13, 2025

Uh oh!

yifanmai commented Dec 3, 2025

Uh oh!

chakravarthik27 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yifanmai commented Dec 4, 2025

Uh oh!

nikhilk7153 commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chakravarthik27 commented Dec 21, 2025

Uh oh!

yifanmai Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikhilk7153 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yifanmai commented Jan 5, 2026

Uh oh!

chakravarthik27 commented Jan 13, 2026

Uh oh!

MiguelAFH commented Jan 13, 2026

Uh oh!

yifanmai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chakravarthik27 commented Dec 4, 2025 •

edited

Loading

nikhilk7153 commented Dec 15, 2025 •

edited

Loading

nikhilk7153 commented Jan 5, 2026 •

edited

Loading